Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification
نویسندگان
چکیده
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier’s classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
منابع مشابه
Random Prism: An Alternative to Random Forests
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is th...
متن کاملRandom Prism: a noise-tolerant alternative to Random Forests
Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms which also induces classification ru...
متن کاملA Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach
The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently ...
متن کاملScalable Ensemble Learning and Computationally Efficient Variance Estimation Scalable Ensemble Learning and Computationally Efficient Variance Estimation Scalable Ensemble Learning and Computationally Efficient Variance Estimation
Scalable Ensemble Learning and Computationally Efficient Variance Estimation
متن کاملImproving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees
Random forest can achieve high classification performance through a classification ensemble with a set of decision trees that grow using randomly selected subspaces of data. The performance of an ensemble learner is highly dependent on the accuracy of each component learner and the diversity among these components. In random forest, randomization would cause occurrence of bad trees and may incl...
متن کامل